feat: application sidekicks = non-HTTP workers with shared state#2287
feat: application sidekicks = non-HTTP workers with shared state#2287nicolas-grekas wants to merge 1 commit intophp:mainfrom
Conversation
e1655ab to
867e9b3
Compare
|
Interesting approach to parallelism, what would be a concrete use case for only letting information flow one way from the sidekick to the http workers? Usually the flow would be inverted, where a http worker offloads work to a pool of 'sidekick' workers and can optionally wait for a task to complete. |
da54ab8 to
a06ba36
Compare
|
Thank you for the contribution. Interesting idea, but I'm thinking we should merge the approach with #1883. The kind of worker is the same, how they are started is but a detail. @nicolas-grekas the Caddyfile setting should likely be per |
ad71bfe to
05e9702
Compare
|
@AlliBalliBaba The use case isn't task offloading (HTTP->worker), but out-of-band reconfigurability (environment->worker->HTTP). Sidekicks observe external systems (Redis Sentinel failover, secret rotation, feature flag changes, etc.) and publish updated configuration that HTTP workers pick up on their next request; with per-request consistency guaranteed via Task offloading (what you describe) is a valid and complementary pattern, but it solves a different problem. The non-HTTP worker foundation here could support both. @henderkes Agreed that the underlying non-HTTP worker type overlaps with #1883. The foundation (skip HTTP startup/shutdown, immediate readiness, cooperative shutdown) is the same. The difference is the API layer and the DX goals:
Happy to follow up with your proposals now that this is hopefully clarified. |
05e9702 to
8a56d4c
Compare
|
Great PR! Couldn't we create a single API that covers both use case? We try to keep the number of public symbols and config option as small as possible! |
Yes, that's why I'd like to unify the two API's and background implementations into one. Unfortunately the first task worker attempt didn't make it into |
|
The PHP-side API has been significantly reworked since the initial iteration: I replaced The old design used
Key improvements:
Other changes:
|
cb65f46 to
4dda455
Compare
|
Thanks @dunglas and @henderkes for the feedback. I share the goal of keeping the API surface minimal. Thinking about it more, the current API is actually quite small and already general:
The name "sidekick" works as a generic concept: a helper running alongside. The current set_vars/get_vars protocol covers the config-publishing use case. For task offloading (HTTP->worker) later, the same sidekick infrastructure could support:
Same worker type, same So the path would be:
The foundation (non-HTTP threads, cooperative shutdown, crash recovery, per-php_server scoping) is shared. Only the communication primitives differ. WDYT? |
b3734f5 to
ed79f46
Compare
|
|
|
Hmm, it seems they are on some versions, for example here: https://github.com/php/frankenphp/actions/runs/23192689128/job/67392820942?pr=2287#step:10:3614 For the cache, I'm not aware of a Github feature that allow to clear everything unfortunately 🙁 |
fe77b5a to
7556610
Compare
My only worry with this is that "sidekick" implies that there's a "main" character related to it. That's the case here, but wouldn't necessarily be the case for task- or extension workers. Other than the naming, I don't object the api. |
b7e395e to
c50cf08
Compare
|
If we want to unify these concepts, the sidekick workers should probably be configured like regular workers and started always from the Caddy config (very messy to let http workers start and stop them). Just a The |
Absolute agree, though I believe it would be good to be able to start them from http code and establish communication through channels of sort.
Funny you say that, I created one over the last few days, but I quickly figured that the Cgo overhead and Go generally being much slower than optimised C made this a bit of a futile attempt. Ristretto backend was ~4x slower than apcu/direct C copying. If we want to implement a FrankenPHP\Store it will have to be well-thought out and implemented in pure C. |
|
All green, ready on my side!🎳 @henderkes Thanks for validating the CGo overhead concern. A proper Glad we agree on starting from PHP! That's the current design: get_vars implicitly starts the sidekick on first call. @AlliBalliBaba About APCu: I want sidekicks to be a core feature of FrankenPHP that people can reliably build on, not something depending on an optional third-party extension with its own bugs and serialization overhead. The sidekick API solves a different problem than a shared store: lifecycle management (start, wait-for-ready, crash recovery, shutdown), blocking first call (no polling), and per-sidekick scoping. String-only is by design; config updates don't need complex types, and explicit serialization is always available.
It's actually clean. Every HTTP worker has a bootstrap phase before entering the frankenphp_handle_request loop. That's where get_vars should be called. At-most-once semantics make it safe: all workers call About naming: my preference goes to "sidekick" precisely because these workers ARE secondary to the main app. A Messenger consumer is a different pattern (standalone queue processor). Sidekicks observe the environment and support the main app. The name is also memorable and distinct from the existing worker concept. Caddy-config-based workers could be added later as a complementary approach for ops-level control, although I'm not sure that'd provide the best DX for app developers. Those will prefer not touching the Caddyfile when adding a new sidekick (eg when installing a third-party-provided one). |
Not saying it must be apcu and it would also be fine to start off with string-only support. Just saying that you can't change the API afterwards without BC break, so the functions/classes added to core should be well-named and well thought out so there's room for extension and optinization.
It is messy since the http-workers can start these background threads at runtime without any mechanism to oversee or stop them. It can become hard to reason about which sidekick workers are currently running. The original concept of 'task workers' had workers call |
Yes, I specifically like this approach more because it can handle many different types of workers (queue consumers, publishers, sidekicks and extension workers) with a unified C/Go API.
I believe the point is to define them as always available in a scripts lifetime after being started. A declaration in the Caddyfile would necessitate it to live the entire time and start immediately. I do think we should allow them to communicate with all serializables, though, not just strings, but could be added later on. |
c50cf08 to
9f1ade2
Compare
henderkes
left a comment
There was a problem hiding this comment.
The implementation generally seems solid except for a few nitpicks, but in general the more I think about this, the less I'm a fan of the extra API for it.
I'd like to pick the set_vars and get_vars, add them to the task worker PR and generally give task workers the ability to establish bi-directional communication with http threads/workers.
| // SidekickEntrypoint is the script used to start sidekicks (e.g., bin/console) | ||
| SidekickEntrypoint string `json:"sidekick_entrypoint,omitempty"` |
There was a problem hiding this comment.
A big issue I see with this is that with this setting being necessary on the Caddy side, there's comparatively little benefit to starting the sidekicks/workers from php.
There was a problem hiding this comment.
This is analogous to worker { file ... }entries: HTTP workers also need their entrypoint in config. It's a configure-once setting that goes into config templates and is forgotten. What's important to understand is that PHP apps can't reliably guess their sidekick entrypoint from the HTTP context.
There was a problem hiding this comment.
What's important to understand is that PHP apps can't reliably guess their sidekick entrypoint from the HTTP context.
But why not, though? I feel like defining it in the Caddyfile immediately invalidates the biggest benefit to being able to control it from php.
| 1. A sidekick runs its own event loop (subscribe to Redis, watch files, poll an API, etc.) | ||
| 2. It calls `frankenphp_sidekick_set_vars()` to publish key-value pairs | ||
| 3. HTTP workers call `frankenphp_sidekick_get_vars()` to read the latest snapshot | ||
| 4. The first `get_vars` call **blocks until the sidekick has published** — no startup race condition |
There was a problem hiding this comment.
I don't think this is necessary to prevent race conditions, because workers (thinking in general task worker terms here, unless otherwise specified) need to reach ready state before any requests are served.
Additionally, we could then rename this to
frankenphp_get_vars(?string $workerName = null, ?float $timeout = null): array;
// name of the sidekickin case of explicit polling.
There was a problem hiding this comment.
Sidekicks don't have a "ready state" in the traditional worker sense. They don't wait for work to arrive. They run their own event loop (Redis pub/sub, file watching, etc.). The set_vars call IS the ready signal: "I've observed the environment and here's the initial state." The blocking get_vars ensures HTTP workers never see an uninitialized state. Without it, the first few requests would get empty/default values, which is the exact race condition this design eliminates.
| } | ||
| ``` | ||
|
|
||
| Each `php_server` block has its own isolated sidekick scope. |
frankenphp.c
Outdated
| if (!is_http_thread) { | ||
| return retval; | ||
| } |
There was a problem hiding this comment.
I'm not a fan of this, it's better to have it explicitly reach frankenphp_handle_task.
There was a problem hiding this comment.
Not sure I get your point. Sidekick scripts don't serve HTTP requests, they run a continuous loop. HTTP request startup/shutdown (output buffering, session handling, etc.) is meaningless for them and would add overhead.
There was a problem hiding this comment.
HTTP request startup/shutdown (output buffering, session handling, etc.) is meaningless for them and would add overhead.
Of course, but I think there should be an explicit "ready" signal from the sidekick, rather than an immediate ready marker. I suppose the frankenphp_set_vars call is one.
9f1ade2 to
0faf648
Compare
|
Thanks for the review comments. About the remaining discussion points:
The goal is for app developers to stay in control. Requiring both a PHP file and a Caddyfile entry for each sidekick is the wrong priority. With the app itself, it's three places to maintain for what should be an app-level concern. With PHP-driven start, adding a sidekick is just code: write the command, call get_vars('my-sidekick') from the worker bootstrap. No config change, no deployment diff. Third-party packages can ship their sidekicks; install the package, done. If someone needs to list running sidekicks, that's something the PHP app itself can expose (a debug command, a health endpoint). It doesn't need to be in the infrastructure config.
String-only is a deliberate choice, not a limitation. Implicit serialization adds overhead, complexity, and a class of bugs: unserialize failures, class loading issues, exception confusion. APCu has had specific bugs from this in the past. For config publishing (which is what sidekicks do), strings are the right primitive. If structured data is needed, explicit encoding is better. Think SRP. For future task workers, serialization could make more sense (although personally I think this breaks SRP and APCu shouldn't have it) but should be designed separately with its own constraints.
The non-HTTP worker foundation is already shared: same thread model, same crash recovery, same shutdown mechanism, same sidekick_entrypoint. The communication patterns are what differ: set_vars/get_vars for config publishing vs send_task/receive_task for work dispatch. These are complementary, not competing. A future task worker PR can reuse the same foundation and add its own communication primitives without touching the sidekick API. Forcing everything into one handle_task model would turn event-driven patterns (Redis pub/sub, file watching) into polling. Which is exactly what sidekicks exist to avoid.
I don't think task workers would solve new problems for the PHP ecosystem. We already have mature patterns for background work: Symfony Messenger, Laravel Queues, etc. These are actually better than in-process task workers because they offload work to separate processes, keeping the HTTP app focused on what it's for: serving users fast. Sidekicks are the real frontier. The fundamental limitation of PHP's request/response model today is that apps can't receive pushed configuration updates. Every request has to pull its config: check Redis sentinel, pull vaults, evaluate feature flags. There's no way around it currently. Sidekicks change this: a background worker subscribes to changes and publishes them, so HTTP requests read pre-computed, always-fresh values with zero overhead. That's a capability PHP has never had before. Of course, that's my take and I won't object tasks separately, but to me the focus here should be on getting sidekicks right, not on building a general task system that duplicates what existing queue libraries already do well. |
e6c39be to
6fc5cee
Compare
|
Recent changes :
Sidekick lifecycle is now separated from HTTP worker lifecycle:
Readiness =
I then refactored I optimized And there are new internal tests in |
👍 👍 👍 👍
This needs to be changed, it's important that sidekicks restart on worker restart too, because the worker restart may be caused to clear opcache safely (PR open, not yet merged) or reload changed php files.
That's what I wanted!
Need to take a look later. I think most of my points have been addressed, but the last one I'd like to see is: https://github.com/php/frankenphp/pull/2287/changes#r2952125541 Do we need an explicit entrypoint directive? Why can't the php logic define the entry point (as long as it's within the php_server's root?)? |
6fc5cee to
0a767dc
Compare
Good call, PR updated! All green, except unrelated failures (could anyone maybe restart the failing job?)
Right, that's the last unanswered item. Here is my take: Letting PHP define the entrypoint creates several problems:
The value proposition for the community is simple: add one line to your Caddyfile template, then everything else is pure PHP. The entrypoint never changes, it's the app's CLI runner. What sidekicks to start is the app's decision. For Symfony: I'm just wondering if it'd make sense to turn |
Counter point: right now we have infrastructure carrying an application detail and our application code depends on infrastructure configuration. And while the single script is fine for Symfony, there are still plenty of projects that use more of a concerns-separated-by-folder structure.
Don't worry about it, this happens most CI runs. CI is green.
This is kind of true, but only because there's no better alternative to define this in the application. Personally I strongly believe in the model where the application should be the web server, but that's just not really a possibility in php yet. But where possible, I'd keep it in the application. |
|
Fair points. I get that tension. The worker For projects with per-folder structures, I hear the "application as web server" vision. We're not there yet in PHP, but sidekicks are a step in that direction: the app decides what runs, infrastructure just provides the execution context. The entrypoint is the last piece that stays on the infra side. I think that's the right boundary for now. The amphp project has building blocks for writing HTTP servers in pure PHP but the frankenphp model looks way more robust. The Did we reach an agreement on this PR, with just a final review needed? |
With workers however, you can define multiple in multiple files or with multiple environments. With sidekicks, this wouldn't work. I actually "maintain" an application that has multiple workers defined (think
I don't think I'm convinced here. but I'm not hard vetoing anything assuming the previous concern (see above) is addressed. I do believe that defining them within application code is the better idea, though - I don't think security is really an argument, given that the one script defined in infrastructure could do everything other files do, too.
Swoole is quite advanced, but the general paradigm and libraries just aren't there yet. That's one of the reasons why I started contributing to FrankenPHP - I find it to bridge the PHP system in the right direction while making the compromises needed to make it happen. Found myself in the situation to choose between web servers in it's infancy and tried them all - it succeeded where Swoole, AmPhp/ReactPhp and Roadrunner fail.
Mostly, I think. But I believe @AlliBalliBaba is most qualified to further handle feedback for this PR and the ultimate decision regarding such an extensive proposition lies with @dunglas. I haven't paid as much attention to the task worker background API as I should have, so I don't have the full picture on how to best design this approach. Also quite busy with work, so I only have chunks of time to throw at this while also juggling my other open source projects. |
|
I'm very excited by this proposal, but I will not have the time to thoroughly review this PR until next week. I'll likely raise some concerns with the proposed API/naming (it looks a bit specific to me, I'm pretty sure we can achieve something more generic with some small adjustments), but give me some time to dive deeper into the proposal. |
|
I still dislike the points already mentioned in the design itself. The api allows for too many foot guns. Here's what needs to be addressed IMO.
|
|
Just as an example, this is how I could imagine what a configuration might look like, feels cleaner to have a central place where these background workers are defined. Each background worker gets 'ready' once they first listen to the ping coming from the go side. frankenphp {
sidekick_worker /path/to/redis_discoverer.php {
ping 100ms # every 100ms
}
sidekick_worker /path/to/secret_vault.php {
ping 5s # every 5s
}
sidekick_worker /path/to/cron.php {
ping 60s "--domain=example.com" # every 60s with argv/argc or by just passing on the string
}
}# redis_discoverer.php
$discoverRedisEndpoint = function() {
$redisEndpoint = ...;
frankenphp_global_set('redisEndpoint', $redisEndpoint);
};
discoverRedisEndpoint();
while(frankenphp_handle_task($discoverRedisEndpoint)) { # 'ready' once we reach here
# gets pinged periodically
}With 'sidekick workers' a task is just scheduled periodically. But this way you're also potentially allowing an http worker to schedule a task directly. |
Add support for "sidekick" workers: long-running PHP scripts that run outside the HTTP request cycle, observe their environment, and publish configuration to HTTP workers in real time.
This enables patterns like Redis Sentinel discovery, secret rotation, feature flag streaming, and cache invalidation — without polling, TTLs, or redeployment.
New PHP functions
frankenphp_sidekick_get_vars(string|array $name, float $timeout = 30.0): arrayStarts a sidekick and returns its published variables. The first call blocks until the sidekick calls
set_vars()or the timeout expires. Subsequent calls return the latest snapshot immediately. When given an array of names, all sidekicks are started in parallel and vars are returned keyed by name. Works in both worker and non-worker mode.frankenphp_sidekick_set_vars(array $vars): voidPublishes a snapshot of variables from inside a sidekick script. All keys and values must be strings. Each call replaces the entire snapshot atomically. Can only be called from a sidekick context.
frankenphp_sidekick_should_stop(): boolCooperative shutdown check. Sidekick scripts poll this in their event loop to exit gracefully when FrankenPHP shuts down. Can only be called from a sidekick context.
Caddyfile configuration
How it works
Design highlights
get_varsblocks until the sidekick has published its initial stateset_varsreplaces all vars at once — no partial state$_SERVERinjectionset_varsandshould_stopthrow if not called from a sidekickget_varsfrom multiple HTTP workers — only one starts the sidekickget_vars(['a', 'b'])starts all sidekicks concurrentlyphp_serverscoping: eachphp_serverblock has its ownSidekickRegistry— different apps on the same Caddy instance are fully isolatedfunction_exists('frankenphp_sidekick_get_vars')lets the same code work with or without FrankenPHPget_varsworks from any PHP script served byphp_serverbin/consolecompatible: sidekick name is available as$_SERVER['argv'][1]and$_SERVER['FRANKENPHP_SIDEKICK_NAME']Runtime behavior
SCRIPT_FILENAMEis set correctly for non-.phpentrypoints#!/usr/bin/env php) are silently skipped